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Abstract 

Markov Random Field (MRF) models are powerful tools for contextual modeling. However, little is known about how the 
spatial dependence between their elements is encoded in terms of statistical information, more precisely, information-theoretic 
measures. In this paper, we enlight the connection between Fisher information. Shannon entropy and spatial properties of the random 
field in case of Gaussian random variables (a Gaussian Markov random field, or simply GMRF), by defining analytical expressions 
to compute local and global versions of these measures using Besag's pseudo-likelihood function (conditional independence 
assumption). Besides, we use the derived expressions to define an exact expression for the asymptotic variance of the maximum 
pseudo-likelihood estimator of the spatial dependence parameter, showing that, since information equality fails, it is not possible 
to define a lower bound (Cramer-Rao limit). Moreover, the obtained results indicate that accuracy on the estimation of the spatial 
dependence structure of GMRF's depends essentially on the massive presence of contextual patterns satisfying two intuitive 
conditions: high local log-likelihood value (minimization of type-I Fisher information), which means concentration of patterns 
that are likely to be observed, and high local log-likelihood curvature (maximization of type-II Fisher information), which means 
that small perturbations on data cannot cause abrupt changes on the spatial dependence structure. 

Index Terms 
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I. Introduction 

INFORMATION theoretic measures play a fundamental role in a huge variety of applications once they represent statistical 
knowledge in a sistematic, elegant and formal framework. Since the first works of Shannon |1|, and later with many other 
generalizations [2|-|4|, the concept of entropy has been adapted and successfully applied to almost every field of science, 
among which we can cite physics |5|, mathematics |^6j-||8J, economics |9| and fundamentally, information theory ||10J-|)12). 
Similarly, the concept of Fisher information has been shown to reveal important properties of statistical procedures, from lower 
bounds on estimation methods p3|-]15| to information geometry p6| , p7| . Basically, Fisher information can be thought as 
the likelihood analog of entropy, which is a probability-based measure of uncertainty. 

In general, classical statistical inference is focused on capturing information about location and dispersion of unknown 
parameters of a given family of distribution and how this information is related to uncertainty in estimation procedures. Within 
this context, exponential family and independence hypothesis are often assumed, giving the likelihood function a series of 
desirable mathematical properties |jT3)-|15|. 

Although mathematically convenient, in many applications such as image processing and spatial data mining, independence 
assumption is completely unreal |18|. In this scenario, Markov Random Field (MRF) models appear as a natural generalization 
of the classical model by the simple replacement of the independence assumption by the conditional independence assumption. 
Roughly speaking, in every MRF, knowledge of a finite-support neighborhood aroung a given variable isolates it from all the 
remaining variables. A further simplification is to consider a pairwise interaction model, constraining the size of the maximum 
clique to be two. Moreover, if the MRF model is isotropic, which means the spatial dependence parameter is the same for all 
directions, all information regarding its spatial dependence structure is conveyed by a single parameter, from now on denoted 
by /3. 

In this paper, we assume an isotropic pairwise Gaussian Markov Random Field (GMRF) model | [T9| , (20) (also known as 
auto-normal model or conditional auto-regressive model pT) , p2)). Basically, the question that motivated this work and we 
are trying to elucidate here is: What kind of information is encoded by the (3 parameter in such a model? We want to know 
how this parameter, and consequently the whole spatial dependence structure of the random field, is related to both local and 
global information theoretic measures, more precisely the observed and expected Fisher information as well as self-information 
and Shannon entropy. 

In searching for answers for our fundamental question, investigations led us to an exact expression for the asymptotic 
variance of the maximum pseudo-likelihood (MPL) estimator of the spatial dependence parameter on a pairwise GMRF model, 
indicating that asymptotic efficiency is not granted, once information equality fails. An approximation for the asymptotic 
variance of the spatial dependence parameter using the observed Fisher information has been proposed in p3) . Here, however, 
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we use the expected Fisher information as it appears on the Cramer-Rao lower bound. To the best of our knowledge, closed 
expressions for the expected Fisher information in the GMRF model have not been derived before. 

The remaining of the paper is organized as follows: Section 2 discusses maximum pseudo-likelihood (MPL) estimation and 
provides derivations for the expected Fisher information regarding the spatial dependence parameter /3 using both first and 
second derivatives of the pseudo-likelihood function on a pairwise isotropic GMRF model. Intuitive interpretations for these 
two measures are also discussed. In Section 3 we show an expression for the global entropy on a GMRF model (a probability- 
based counterpart to Fisher information), given by the expected value of self-information, a local uncertainty measure based on 
the observation of a contextual configuration pattern defined by a Markovian neighborhood. The results suggest a connection 
between maximum pseudo-likelihood and minimum entropy conditions on a MRF. Section 4 presents an exact expression for 
the asymptotic variance of the MPL estimator of /? as a ratio of both forms of Fisher information, showing that accuracy on 
the estimation of (3 depends essentially on the massive presence of contextual patterns satisfying two intuitive conditions: high 
local likelihood (minimization of one form of Fisher information) and stability, that is, the local log-likelihood is not flat, which 
means that small variations on data cannot cause abrupt changes on the spatial dependence structure (maximization of another 
form of Fisher information). Finally, Section 5 presents the conclusions, final remarks and possibilities for future works. 

II. Fisher Information on Pairwise GMRF's 

The remarkable Hammersley-Clifford theorem ^2A\ states the equivalence between Gibbs Random Fields (GRF) and Markov 
Random Fields (MRF), which implies that any MRF can be defined either in terms of a global (joint Gibbs distribution) or a 
local (set of local conditional density functions) model. For our purposes, we we will choose the later representation. 

Let X — {.Ti, X2, . . . , a;„} be a set of gaussian random variables defined on a rectangular lattice and 77 a non-causal 
neighborhood system. A GMRF is completely characterized by a set of n (number of variables) local conditional density 
functions (LCDF's), given by 125): 
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where 6 — (/i, cr^, (3) is the vector of parameters, with and denoting the mean (expected value) and variance, /3 denoting the 
spatial dependence parameter and rji representing the neighbohood around the i-th random variable in the field. It is interesting 
to note that for /3 = 0, the expression degenerates to a Gaussian density. From an information geometry perspective 1 16|, p7) , 
it means we are constrained to a sub-manifold within the Riemmanian manifold of probability distributions, where the natural 
metric is given by the Fisher information. It has been shown that the geometric structure of exponential family distributions 
exhibit constant curvature. However, little is known about information geometry on more general statistical models, such as 
Gaussian Markov Random Fields. 



A. Maximum Pseudo-Likelihood Estimation 

Maximum likelihood estimation is intractable in MRF parameter estimation due to the existence of the partition function in 
the joint Gibbs distribution. An alternative, proposed by Besag |21|, is maximum pseudo-likelihood estimation, which is based 
on the conditional independence principle. The pseudo-likelihood function is defined as the product of the LCDF's. So, for a 
GMRF model, the log pseudo-likelihood function is defined by the following: 
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By differentiating equation ^ with respect to each parameter and properly solving the pseudo-likelihood equations one 
obtains the following MPL estimators: 
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where k denotes the cardinaUty of the non-causal neighborhood set rji, dij is the sample covariance between the central element 
Xi and a neighbor Xj and cTjk is the sample covariance between two distinct elements Xj and Xk belonging to the neighborhood 
system. Furthermore, assuming a homoscedastic conditional autoregressive (CAR) model, that is, erf = cr| = cr^, Vj G rn, 
i = 1, 2, . . . , n, we have: 
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where — 1 < pij < 1 denotes the Pearson correlation coeficient between the observed variable Xi and a variable Xj from 
the neighborhood system (typical choices are first or second order systems, which correspond to the four and eight nearest 
neighbors on a rectangular bidimensional lattice). This formula gives us an idea of the range of /3 (since MPL estimators are 
asymptotically unbiased) by specifying an upper bound for /3m pl- When elements completely correlate, we have Pmpl ^ ^/k, 
where k is the number of neighbors. Thus, for example, considering a first-order neighborhood system defined on a lattice, 
the limiting value for the MPL estimative is expected to be Pmpl — 0.25. Similarly, for a second-order system, we would 
have Pmpl = 0.125. 

In this sense, the spatial dependence parameter f3 = f3j, j — 1,2, k plays the role of k identical autoregressive coefficients 
(coefficients of a linear combination). This definition in terms of correlation coefficients and the upper bound guarantee that 
the resulting linear combination belongs to the interval = [min {xj} , max {xj}] (in case of centralized data, fi — 0). This 
observation is relevant, especially in Makov Chain Monte Carlo (MCMC) simulation, since the use of /? > 1/fc causes the 
local conditional expectations E [xi\rii] to diverge, once we are sequentially producing new elements outside fix boundaries. 
Note also that, if /3 = 0, the MPL estimators of both fi and cr^ become the widely known sample mean and sample variance. 



B. Fisher information of spatial dependence parameters 

Basically, Fisher information measures the amount of information the observation of a random variable conveys about an 
unknown parameter It can be thought as the likelihood analog of entropy, which is a probability-based measure of uncertainty. 
Often, when we are dealing with independent and identically distributed (i.i.d) random variables, the computation of the 
global Fisher Infomation presented in a random sample X = {xi,X2, ■ ■ ■ ,a;„} is quite straighforward, since each observation 
Xi, i = 1,2, . . . ,n, brings exactly the same information. However, this is not true for spatial dependence parameters, since 
different configuration patterns provide distinct contributions to the sample observed Fisher information, which is a reasonable 
approximation to the expected Fisher information |26|. 



7 ) Observed Fisher information: Considering a MRF defined by a set of LCDF's, the observed Fisher information can be 
calculated in terms of the pseudo-UkeUhood equation as (type-I Fisher information); 
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and it can be estimated by the following, justified by the Law of Large Numbers: 
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where p{xi\rii, 9) is the LCDF of the Markovian model. Thus, (j)^ is an unbiased estimator of the observed Fisher information, 
that is, lj,llil3) — E[(j)[3], making (fp ^ good approximation to Replacing equation |l| in ([sj and after some 

manipulations, a closed expression for the observed Fisher information, (j>p, in the pairwise GMRF model is given by the 
following: 
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Fig. 1 . Simple geometrical inteipretation for local Fisher information minimization: by changing the value of a variable in order to minimize its local Fisher 
information, a more likely configuration pattern raises (for the global spatial dependence parameter). 

Note that (pp is simply an average of the local observed Fisher information along the random field. Thus, we can think of 
4>i3 (xi), for z = 1, 2, . . . , n, as being the information that a particular contextual spatial pattern provides as contribution to 
the global observed Fisher information. In this sense, the observed Fisher information is explicitly defined in terms of local 
measures. Note the similarity between <j)p (xi) and self-information, —log p{xi). Basically, the main difference is that while 
the former is based on the likelihood, the latter is based on the probability. 

Alternatively, one can compute the observed Fisher information by the negative of the second derivative (type-II Fisher 
information); 
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resulting in the following approximation: 
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Note that while (pp is a function of /3, ipp does not depend explicitly on the spatial dependence parameter. Once again, ?/'^ 
is the average of another local Fisher information measure, tpp (xi), along the entire random field. 

Therefore, with these two local measures, (pij {xi) and ?/'^ {xi), we can assign an information value to every element of 
a MRF. A relevant question concerns the interpretation of these local information values. Roughly speaking, 0^ {xi) is the 
quadratic rate of change of the local likelihood, which means that observations showing low values of (pp {xi) are very likely to 
occur throughout the field (they are close to the maximum of the local likelihood). In other words, these patches are "aligned" 
to the expected behavior. On the other hand, observations showing high values of 0^ {xi) are typically landmarks, because 
they bring a significant amount of information about the global spatial dependence structure, since they are not likely to 
occur for that particular value of (3 (informative patches). Therefore, local Fisher information minimization in MRF's produces 
configuration patterns or patches that are more likely to occur. Basically, i/)^ (xi) tell us how informative a given patch is. 

Recent works in the signal and image processing literature have been investigating the use of this criterion in Gaussian 
denoising by considering a tradeoff between local mean square error (MSE) minimization and local observed Fisher information 
minimization, showing promissing results p7) . Figure [T] depicts a simple geometrical illustration of the local Fisher information 
minimization process, showing that patterns ehxibiting low values of cpp (xi) are close to a maximum of the local likelihood. 

Regarding ipp {xi), informally speaking, it can be interpreted as a curvature measure. Thus, points showing low values of 
this measure have a nearly flat local likelihood, which means that small perturbations on this set of points can cause a sharp 
change on (3 parameter, and consequently in the global spatial dependence structure. On the other side, if we have many patches 
exhibiting large values of tpp (xi), change on the global spatial structure is unlikely to happen. These rather informal arguments 
defines the basis for understanding the meaning of the asymptotic variance of maximum pseudo-likelihood estimators, as it 
will be discussed in the next Sections. Basically, (xi) is a measure of how sure we are about the local spatial dependence 
structure (at a given point Xi), since a high average curvature is desired for better accuracy of the /? MPL estimator. 
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2) Expected Fisher information: Unlike the observed Fisher information, the expected Fisher information is strictly a global 
measure. It is defined by the expected value of the squared score function: 



$p = E 
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or equivalently, in classical inference (exponential family i.i.d random variables), by taking the expectation of the negative of 
the second derivative, which can be interpreted aa being an average curvature: 



= -E 



(13) 



In the following, closed-form expressions for both <I>^ and on the GMRF model are presented, showing that, in general, 
information equality = fails. From ( [T2] i, after some algebra, we obtain the following expression, which is composed 
by three main terms: 
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Expanding the first term of the previous expression gives us: 
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But, according to the Isserlis' theorem |[28), we have the following identity: 



E = E [X1X2] E + E E + E E [X^X^] 

which finally lead us to: 
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We now proceed to the expansion of the second main term of ( [T4] l. By expanding the square, muhiplying the summations 
and using the identity in ( [T6] l, we have: 
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Finally, the thrid term of ( [T4] i is given by: 
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Therefore, combining all the parts, the complete expression for $^ (Fisher information for the pairwise GMRF spatial 
dependence parameter given by the square of the score function) is given by the following equation: 
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which, in case of homogeneity of variance (homoscedasticity), af — a'j = cr"^, Vj G r^i, i = 1, 2, . . . , n, is further simplified 
to: 
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Note that all information concerning the spatial dependence structure conveyed by f3 is provided by the covariance matrix 
of the spatial configuration patterns (blocks or image patches) defined by the neighborhood system. In this sense, the proposed 
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expression shows an explicit connection between the patch-based data covariance matrix and the spatial dependence structure 
(where each sample is a lexicographic version of a patch defined by the neighborhood system). In case of null cross-covariances 
(uncorrected observations), the Fisher information is zero, which is intuitive, since there is no induced spatial dependence 
structure. In theory, such situation describes the behavior of a white noise random field. 

Following the same methodology, a closed form expression for 'i'p can also be obtained. The next equation shows the 
Fisher information for the pairwise GMRF spatial dependence parameter given by the negative of the second derivative of the 
pseudo-likelihood function. Note that unlike $^,5'^ does not depend on /?. 
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Thus, in case of a homogeneity of variance, we have the following equality: 
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Looking at the derived expressions it is straightforward to note that a trivial condition for information equality, — '^p, 
is /? = and aij = 0, Vj, which are, essentially, equivalent conditions to express no correlation between the random variables. 
Obviously, in this case. Fisher information is zero since the observations bring no information about the spatial dependence 
parameter. Besides, since /3mpl is inversely proportional to '^p, which is a measure of how sure we are about /3, the relationship 
between Pmpl and p sug gests that the accuracy of MPL estimation depends on the global spatial dependence structure of 
the observed data. In Section 
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we derive an exact expression for the asymptotic variance of (3mpl and discuss conditions for 
accuracy on GMRF model spatial dependence parameter estimation. The next Section discusses another important connection 
which is the relation between Fisher information. Shannon entropy and the parameter /3 on pairwise GMRF's. 



III. Entropy ON Pairwise GMRF'S 

In case of independent and identically distributed random variables Xi,X2, ■ ■ ■ , X„, the global Shannon entropy H is given 
in terms of the expected value of the self-information, H{Xi) = E\I{X)], where I{X) = —log{p{Xi)), by simply multiplying 
H(Xj) by the sample size n. In this Section we derive an expression for the global entropy by considering the pseudo-likelihood 
function. Considering a pairwise GMRF model, the expression for the global entropy of n spatially dependent random variables 
is given by the following: 
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Note that. Shannon entropy is a quadratic function of the spatial dependence parameter /3. Since the coefficient of the 
quadratic term is strictly non-negative (it is the expected Fisher information), entropy is a convex function of (3. Also, as 
expected, when /3 — 0, the resulting expression is the entropy of n i.i.d gaussian random variables. Thus, the global entropy 
reaches is minimum when: 

E^^J 

P^p-nJ2p^3=0 ^^^^^ =/3mpl (25) 

jem Pjk 

that is, among all possible values of /3, the maximum pseudo-likelihood estimative /3m pl is the one that minimizes the global 
Shannon entropy, which in this case will be equal to: 

H,3 = ^logi2^) + -logia"") + -~ -Pmpl E (26) 

showing that in the limiting case of completely correlated random variables, the global entropy reach a lower bound, since 
Pmpl = 1/fc and = 1, Vj G -q^, given by: 

Hp = |/og(27r) + "^logia^) (27) 

Therefore, in this MRF model, maximum pseudo-likelihood estimation is closely related to entropy minimization. Since 
entropy expresses a measure of disorder or randomness, this fact shows that, when we perform MPL estimation, we are 
essentially choosing, among all the possible values of /3, the estimative that minimizes the global uncertainty of the system. In 
the next Section, we will move forward to see how Fisher information is also related to uncertainty in (3 estimation, by means 
of the asymptotic variance of MPL estimators. 

IV. Asymptotic Variance of MPL Estimators 

Unbiasedness is not granted by either ML or MPL estimation. Actually, there is no method that guarantees the existence of 
unbiased estimators for a fixed n-size sample. Often, in the exponential family, MLE coincide with UMVU {Uniform Minimum 
Variance Unbiased) estimators because they are functions of complete sufficient statistics (if MLE is unique, then it is a function 
of sufficient statistics). Also, there are several characteristics that make ML estimation a reference method ||13J-|,15 1. Making 
the sample size grow infinitely {N — > oo), MLE becomes asymptotically unbiased and efficient. Unfortunately, there is no 
result showing that the same occurs in MPL estimation. 

A. Asymptotic Variance on the GMRF Model 

Asymptotic evaluations uncover the most fundamental properties of a mathematical procedure, providing a powerful and 
general tool for statistical analysis. In this Section we derive an expression for the asymptotic variance of the MPL estimator of 
the pairwise GMRF spatial dependence parameter (3. It is known from the statistical inference literature that both ML and MPL 
estimators share two important properties: consistency and asymptotic normality | [29) , pO) , making it possible to completely 
characterize their behavior in the limiting case. In other words, (3m pl ~ N (^PmpLtvJj, where vp denotes the asymptotic 
variance. It is known that the asymptotic covariance matrix of MPL estimators is given by pT[ : 



C0) = H~\p)J0)H-\p) 



(28) 
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with 



H{p) = \vHogPL{0) (29) 
J0) = Vara \viogPL0) 

where H and J denotes the Jacobian and Hessian matrices regarding the log pseudo-hkeUhood function, respectively. Thus, 
in the uniparametric case we have the following definition for the asymptotic variance v^: 
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Noting that the expected value of the derivative of the log pseudo-likelihood equation is zero: 
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we finally get the resulting expression for the asymptotic variance of $mpl as the ratio between and 4*^, given by: 



jerii kevi i&m 




[a^ajk + 2aijaik] - 2/3 ^ 5] [o'ijO-kl + (Tik(Tjl + CruCFjk] 



jerii kerii ler/i 



E 

jerji kerii l&m merti 



(32) 



Note that, since vp = /^'p, if we had equivalence between the two forms of Fisher information, that is, = 'J'^g ^ 0, the 
expression for vp would be simplified to the traditional Cramer-Rao lower bound. The interpretation of this equation indicates 
that the accuracy in (3 estimation depends essentially on two main factors: 1) minimization of ^p, which means the variance 
of the local log-likelihood functions is close to zero and 2) maximization of ^p, which essentially means that, in average, 
the local log-likelihood functions are not flat, that is, small variations on data cannot cause abrupt changes on the parameter 
and as a consequence in the spatial dependence structure. Finally, once we cannot obtain a unique lower bound in terms of 
Fisher information, since information equality fails, it is not possible to conclude that the estimator is asymptotic efficient, as 
it happens when we have independent observations (maximum-likelihood estimation). 



V. Conclusions and Final Remarks 

In this paper we addressed the problem of characterizing the spatial dependence structure of a pairwise isotropic Gaussian 

Markov Random Field by means of information theoretic measures. Analytical expressions for observed and expected Fisher 
information regarding the spatial dependence parameter on a GMRF model were derived using the pseudo-likelihood function, 
elucidating the connection between ^ and the distribution of patches in the observed data (patch-based data covariance matrix). 
Intuitive geometrical interpretations for these quantities were discussed in the context of MRF models. However, to allow the 
computation of these measures, a proper /? parameter estimative is required. Maximum Pseudo-Likelihood estimators for the 
GMRF model parameters were derived, indicating that $mpl is completely defined in terms of spatial correlation coefficients. 
Using the same methodology, an expression for the Shannon entropy of a GMRF was derived, showing its relation with Fisher 
information and maximum pseudo-likelihood estimation. Finally, using the derived expressions for the Fisher information, an 
exact expression for the asymptotic variance of $mpl in the GMRF model was presented, allowing a complete characterization 
of its behavior in the limiting case. Future works include an information-theoretic study about the spatial dependence structure 
on Gaussian, Ising and Potts pairwise MRF models defined on lattices and non-regular graphs, Markov Chain Monte Carlo 
simulation for temporal analysis of information theoretic measures in the study of phase-transitions in complex systems, 
information geometry based approaches on MRF models and also applications on signal, image and video processing, such as 
the development of novel denoising and segmentation techniques. 



10 



VI. ACKNOWLEDMENTS 

The author would like to thank CNPQ (Brazilian Council for Research and Development) for the finantial support through 
the research grant number 475054/201 1-3. 

References 

[1] C. Shannon and W. Weaver, The Mathematical Theory of Communication. University of Illinois Press, Urbana, Chicago, IL & London, 1949. 
[2] A. R6nyi, "On measures of information and entropy," in Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability, 1960, 
pp. 547-561. 

[3] C. Tsallis, "Possible generalization of boltzmarm-gibbs statistics," Journal of Statistical Physics, vol. 52, pp. 479^87, 1988. 

[4] A. Bashkirov, "Rdnyi entropy as a statistical entropy for complex systems," Theoretical and Mathematical Physics, vol. 149, pp. 1559-1573, 2006. 

[5] E. Jaynes, "Information theory and statistical mechanics," Physical Review, vol. 106, pp. 620-630, 1957. 

[6] H. Grad, "The many faces of entropy," Communications in Pure and Applied Mathematics, vol. 14, pp. 323-254, 1961. 

[7] R. Adler, A. Konheim, and A. McAndrew, "Topological entropy," Transactions of the American Mathematical Society, vol. 114, pp. 309-319, 1965. 
[8] L. Goodwyn, "Comparing topological entropy with measure-theoretic entropy," American Journal of Mathematics, vol. 94, pp. 366-388, 1972. 
[9] P. A. Samuelson, "Maximum principles in analytical economics," The American Economic Review, vol. 62, no. 3, pp. 249-262, 1972. 
[10] M. Costa, "Writing on dirty paper," IEEE Transactions on Information Theory, vol. 29, no. 3, pp. 439-441, 1983. 

[11] A. Dembo, T. Cover, and J. Thomas, "Information theoretic inequalities," IEEE Transactions on Information Theory, vol. 37, no. 6, pp. 1501-1518, 
1991. 

[12] T. Cover and Z. Zhang, "On the maximum entropy of the sum of two dependent random variables," IEEE Transactions on Information Theory, vol. 40, 

no. 4, pp. 1244-1246, 1994. 
[13] E. L. Lehmann, Theory of Point Estimation. New York: Wiley, 1983. 
[14] P. J. Bickel, Mathematical Statistics. New York: Holden Day, 1991. 
[15] G. Casella and R. L. Berger, Statistical Inference, 2nd ed. New York: Duxbury, 2002. 

[16] N. H. Amari, S., Methods of information geometry (Translations of mathematical monographs v. 191). American Mathematical Society, 2000. 
[17] R. E. Kass, "The geometry of asymptotic inference," Statistical Science, vol. 4, no. 3, pp. 188-234, 1989. 

[18] A. Anandkumar, L. Tong, and A. Swami, "Detection of gauss-markov random fields with nearest-neighbor dependency," IEEE Trans, on Information 

Theory, vol. 55, no. 2, pp. 816-827, 2009. 
[19] J. Moura and N. Balram, "Recursive structure of noncausal gauss-markov random fields," IEEE Trans, on Information Theory, vol. 38, no. 2, pp. 334—354, 

1992. 

[20] J. Moura and S. Goswami, "Gauss markov random fields (gmrf) with continuous indices," IEEE Trans, on Information Theory, vol. 43, no. 5, pp. 
1560-1573, 1997. 

[21] J. Besag, "Spatial interaction and the statistical analysis of lattice systems," Journal of the Royal Statistical Society - Series B, vol. 36, pp. 192-236, 
1974. 

[22] , "Statistical analysis of non-lattice data," The Statistician, vol. 24, no. 3, pp. 179-195, 1975. 

[23] A. L. M. Levada, N. D. A. Mascarenhas, and A. Tannus, "On the asymptotic variances of gaussian markov random field model hyperparameters in 

stochastic image modeling," in Proceedings of the 19th International Conference on Pattern Recognition (ICPR). Tampa/FL: IEEE, 2008, pp. l—i. 
[24] J. Hammersley and P. CUfford, "Markov field on finite graphs and lattices," 1971, unpublished. 

[25] R. Chellappa, Progress in Pattern Recognition 2. North Holland Pub. Co., 1985, ch. Two-dimensional discrete gaussian Markov random field models 
for image processing, pp. 79-112. 

[26] B. F. Efron and D. V. Hinkley, "Assessing the accuracy of the ml estimator: observed versus expected fisher information," Biometrika, vol. 65, pp. 
457^87, 1978. 

[27] A. L. M. Levada and D. C. Correa, "An adaptive approach for contextual audio denoising using local fisher information," in Proceedings of the 24th 

International Symposium on Circuits and Systems (ISCAS 2011), 2011, pp. 125-128. 
[28] L. Isserlis, "On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables," Biometrika, 

vol. 12, pp. 134-139, 1918. 

[29] J. Jensen and H. Kiinsh, "On asymptotic normality of pseudo likelihood estimates for pairwise interaction processes," Annals of the Institute of Statistical 

Mathematics, vol. 46, no. 3, pp. 475^86, 1994. 
[30] G. Winkler, Image Analysis. Random Fields and Markov Chain Monte Carlo Metliods: A Matltematical Introduction. Secaucus, NJ, USA: Springer- Verlag 

New York, Inc., 2006. ' 

[31] G. Liang and B. Yu, "Maximum pseudo likelihood estimation in network tomography," IEEE Trans, on Signal Processing, vol. 51, no. 8, pp. 2043-2053, 
2003. 



